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Abstract 

Local protein structure analysis is informative to protein structure 
analysis and has been used successfully in protein structure prediction 
and others. Proteins have recurring structural features, such as helix caps 
and beta turns, which often have strong amino acid sequence preferences. 
And the challenges for local structure analysis have been identification 
and assignment of such common short structural motifs. 

This paper proposes a new mathematical framework that can be ap- 
plied to analysis of the local structure of proteins, where local confor- 
mations of protein backbones are described using differential geometry 
of folded tetrahedron sequences. Using the framework, we could capture 
the recurring structural features without any structural templates, which 
makes local structure analysis not only simpler, but also more objective. 
Programs and examples are available from http : / /www . genocript . com. 

AMS Subject Classification: 52C99, 92B99 

Key Words and Phrases:Discrete differential geometry - Tetrahe- 
dron sequence - Local protein structure 

1 Introduction 

Protein is a sequence of amino acids, which folds into a unique three-dimensional 
structure in nature. And one could identify proteins with polygonal chains 
obtained by connecting the center of adjacent amino acids. Since the functional 
properties of proteins are largely determined by the structure, protein structure 
analysis is crucial to the study of proteins. 

Local protein structure analysis is informative to protein structure analysis 
and has been used successfully in protein structure prediction and others. Pro- 
teins have recurring structural features, such as helix caps and beta turns, which 
often have strong amino acid sequence preferences. And the challenges for lo- 
cal structure analysis have been identification and assignment of such common 
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Figure 1: Introduction, (a): Protein (transferase 1RKL) and its structural 
features, (b): Folding of a tetrahedron sequence, (c): Local features and the 
corresponding 5-tile codes (see section |4]). 

short structural motifs (pQ, [2], [3], [4], [5]). Identification involves description 
of protein backbone conformation and definitions of the structural motifs. And 
assignment is not a trivial task, due to the variations observed in nature when 
compared to ideal ones. 

In this paper, we introduce a new differential geometrical approach for local 
structure analysis. As for differential geometrical description, a lot of works on 
the surface of protein molecules are known (to name a few, [6], [7]). But protein 
backbone structure is usually studied via classification ([H], [9]) and differential 
geometrical approach has been rarely taken so far. 

One of the few is the early work of |10j which described protein backbones 
as polygonal chain, where each line segment corresponds to the virtual-bond 
between consecutive a-carbons. In contrast, we describe local conformation of 
protein backbones using folded tetrahedron sequences (Figure Q] (b)). 

As for the shape of protein backbones, [12] proposed the notion of alpha- 
shape and [11] examined geometric restrictions on polygonal protein chains. 
Moreover, [12] reviewed topological knots in protein structure. 

2 Differential geometry of triangles 

For simplicity, we first consider the differential geometry of triangles. 
2.1 Basic ideas 

Let's consider unit cube [0, l] 3 in the three-dimensional Euclidean space R 3 
and divide each of the three facets which contain (0, 0, 0) into two triangles 
along diagonal, as shown in figure [2] (a). Then, if we pile the cubes up in the 
direction of (—1, —1, —1), we would obtain "peaks and valleys" of cubes, where 
the division of the facets of each cube makes up a division of the surface of the 
peaks and valleys (figure [2] (b) top). And a "flow" of triangles in R 2 is obtained 
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Figure 2: Basic ideas, (a): Division of facets of unit cube [0, l] 3 . (b): "Peaks 
and valleys" of cubes and its projection on a hyperplane. (c): Projection n from 
the collection S of all the "slant" triangles to the collection B of all the "flat" 
triangles. 

by projecting the surface onto a hyperplane, (figure[2](b) bottom). For example, 
the grey "slant" triangles on the surface specify the closed trajectory of the grey 
"flat" triangles on the hyperplane. 

In the following, we use monomial notation to denote points and trian- 
gles in R 3 . That is, we denote point (l,m,n) G R 3 by monomial 
Z[xi, X2, X3]. And the triangle of vertices (I, m, n), (I + 1, to, n), (I + l,m + 1, n) 
6 R 3 are denoted by aryX™ x$\x\X2[. For example, a[xiXj] is the triangle of 
vertices a, axi, and axiXj G Z[xi, X2, X3] (figure [2] (c)). 

2.2 Tangent bundle over flat triangles 

Let 7r be the projection of the collection S of all the slant triangles onto the 
collection B of all the flat triangles along direction (—1,-1,-1), where the 
image of a[xiXj] € S is denoted by |a[xia/j]| (figurerS] (c)). Then, projection tt 
induces tangent bundle-like structure TB over B, where the gradient of slant 
triangles are defined as follows: 

Definition 1. The gradient Da[xiXj] of a^iXj] £ S is monomial XiXj G 
Z[xi, X2-, X3]. In particular, there is a one-to-one correspondence between TB 
and {X1X2, X1X3, X2X3} x B. And we indicate the gradient value over a flat 
triangle by a bold edge as shown in figure [3] (a) . 

For example, slant triangles a[xiX2], axi[x2X3], and a/x 3 [x3Xi] S S are 
projected onto the same flat triangle |a[xiX2]| G B and their gradients are X1X2, 
X2X3, and X!X3 respectively (figure [3] (a)). 

Then, a gradient value over a flat triangle specifies a local trajectory at the 
flat triangle as follows: 

Definition 2. The local trajectory defined by a[x,Xj] G S at |a[xiXj]| G B is 
the three consecutive flat triangles {|axj[xjXi]|, |a[xiXj]|, |a/xj[xjXj]|} C B. As 
figure[3](b) shows, these are the adjacent triangles connected along the direction 
of the bold edge of |a[xjXj-]|. And the local trajectory is specified uniquely by 
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Figure 3: Differential structure, (a): Gradient values of slant triangles over 
|a[xiX2]| € B. From left to right, a[xiX2] whose gradient is X1X2, axi[x2X3| 
whose gradient is X2X3, and a/x^lx^xi] whose gradient is 2:1X3. (b): The local 
trajectory specified by a[xiX2] G S at |a[xiX2]| G B, |axi[x2Xi]| (downward) and 
|o/x2[x2Xi]| (upward) G B. (c): Smoothness condition at |o[xiX2]| G B (white) 
specified by a[xiX2] G S. The next triangle |axi[x2Xi]| G B (grey) could assume 
either X1X2 or X2X3 as gradient. 



the gradient of a[XjXj]. 

Now wc impose a kind of "smoothness condition" as shown in figure [3] (c) . 
That is, each flat triangle assume one of two gradient values, which are deter- 
mined naturally by the gradient of the preceding triangle. Suppose that the 
gradient at current triangle |a[xiXa]| G B is X1X2 and the gradient at next tri- 
angle |axi[x2Xi]| G B is X1X3. Then, two flat triangles |a[xiXa]| and |axi[x2Xi]| 
are separated by the bold edge of |axi[x 2 Xi]| (figure [3] (c) right). In this case, 
we permit either X1X2 or X2X3 as gradient of |axi[x2Xi]|. 

As an example, let's consider the peaks and valleys shown in figure [2] (b), 
which is specified by three peaks a, b = a/x-2, and c = ax\x2/x^, G Z[xi,X2,X3] 
(figured]). Peaks and valleys define a "smooth" vector field V on B by the 
following mapping: 

V : B ~* {xiX2,xix 3 ,X2X 3 }, V(|a[xjXj]|) := XiXj, 

where a[xjXj] G S is the slant triangle on the surface of the peaks and valleys 
over |a[xjXj]| G B. 

Let ' s start from triangle I a [x 1X2] I (grey) and move downward: t[0] = |a[xiX2]| 
andy(t[0]) = X1X2. Then, gradient V(t[0]) specifies local trajectory {|axi[x2Xi]|, 
|a[xiX2]|, |a[xiX3]|} at t[0] (Note that |a/x2[x2Xi]| = |a[xiX3]|). Since we move 
downward, next triangle t[l] is |axi[x2Xi]| and we obtain V^(t [1] ) = X1X2. Then, 
gradient specifies local trajectory {|axiX2[xiX2]|, |axi[x2Xi]|, |a[xiX2]|} 

at t[l}. And next triangle t[2] is |axiX2[xiX2]|. Continuing the process, we 
obtain the closed trajectory of length 10. 
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Trajectory 
t[0] = |a[x,x 2 ]| 

t[l] = laxjfxaxjl 

t[2] = laXiXjfXjXJl 

t[3] = |ax 1 2 x 2 /x 3 [x3X2]| 

t[9] = |a[x 1 x 3 ]| 



Gradient 


2 nd Deriva. 


V(t[0]) 


— ^M^2 


DV(t[0]) 


= D 


V(t[l]) 


= X[X 2 


DV(t[l]) 


= D 


V(t[2]) 


— XjX2 


DV(t[2]) 


= D 


V(t[3]) 


= x 2 x 3 


DV(t[3]) 


= U 


V(t[9]) 


= X[X 3 


DV(t[9]) 


= U 



Figure 4: Closed trajectory of the vector field specified by three peaks a, b 
a/x2, and c = ax\x2j 23 S Z[xx, X2, X3]. 



2.3 Encoding of the shape of trajectories 

Finally, let's consider variation of gradient along a trajectory. Thanks for the 
smoothness condition, variation of gradient, i.e., the "second derivative", along 
a triangle trajectory is given as binary valued sequence. 

Definition 3. The derivative DV of vector field V along trajectory {t[i]} 
is defined as follows: 



DV : B -► {U,D}, DV(t[i]) := 



_f DV(t[i-l}) XV(t\i]) = V(t[i-l]) 
-DV(t[i - 1]) otherwise, 



where —U := D and —D:=U. In words, change value if the gradient changes. 

As an example, let's consider the trajectory of figure [4] again. First, set any 
initial value: DV(t[0]) — D. Then, since the first two triangles t[0] and t[l] have 
the same gradient, £>V(i[l]) is also D. The value of the second derivative is D 
until t[3], where it changed to U since the gradient of t[2] is different from that 
oft [3]. 

Continuing the process, we obtain a binary sequence of length 10, DDDUD 
UUUDU, which describes the shape of the trajectory. 



3 Differential geometry of tetrahedrons 

Similarly we obtain a flow of tetrahedrons in R 3 by considering peaks and val- 
leys of 4-cubes in R 4 . In this case, each trajectory of tetrahedrons could be 
obtained by folding a tetrahedron sequence which satisfies the following condi- 
tions (figure Q] (b)) : (i) Each tetrahedron consists of four short edges and two 
long edges, where the ratio of the length is y/3/2 and (ii) Successive tetrahe- 
drons are connected via a long edge and have a rotational freedom around the 
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Figure 5: Tangent bundle (a): Projection ir from the collection S of all the 
slant tetrahedrons to the collection B of all the flat tetrahedrons, (b): Gra- 
dient values of slant tetrahedrons over |a[xiX2X3]| € B. From left to right, 
a/x4\xiX\X2\ whose gradient is X1X2XA, a[xi£2£3| whose gradient is xiX2X^, 
a£i [£2X3X4] whose gradient is and ax\X2 [£3X4X1] whose gradient is 

X1X3X4. The arrows of slant tetrahedrons indicate the direction of "down" in 
R 4 

edge. In particular, we could compute the differential structure on a trajectory 
without considering 4-cubes. 

3.1 Tangent bundle over flat tetrahedrons 

Let's consider 4-cube [0, l] 4 in the four-dimensional Euclidean space R 4 . Then, 
the facets of 4-cubes are three-dimensional unit cubes and we divide each of 
the four facets which contain (0, 0, 0, 0) into six tetrahedrons along diagonal, as 
shown in figure [5] (a) top. 

In the following, we denote point (k, I, m, n) G R 4 by monomial 
Z[xi, X2, £3, £4]. And the tetrahedron of vertices (fc, I, m, n), (k + 1, 1, m, n), (k + 
1, l + l,m, n), (k+1, Z+l,m+l, n) G R 4 are denoted by x^x^x^xj [X1X2X3]. For 
example, a[x{XjXk] is the tetrahedron of vertices and 

Z[xi,X2,X3,X 4 ]. 

Let 7r be the projection of the collection S of all the slant tetrahedrons onto 
the collection B of all the flat tetrahedrons along direction (—1,-1,— 1,-1), 
where the image of a[xiXjXk] G S is denoted by [a^XjXfc]! (figure [5] (a)). Then, 
projection ir induces tangent bundle-like structure TB over B, where the gra- 
dient of slant tetrahedrons are defined as follows: 

Definition 4. The gradient Da[xiXjXk] of a[xiXjXk] G S is monomial 
XiXjXk G Z[xi, X2, £3, £4]. In particular, there is a one-to-one correspondence 
between TB and {X1X2X3, X1X2X4, £1X3X4, X2X3X4} x B. And we indicate the 
gradient value over a flat triangle by a bold edge as shown in figure [5] (b), where 
arrows of slant tetrahedrons indicate the direction of "down" in R 4 . 

For example, slant tetrahedrons a/x4[£4£i£2], a[£i£2£3|, a£i[£2£3£4], and 
d£i£2[£3£4£i] G S are projected onto the same flat tetrahedron |a[£i£2£3]| G B 
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Figure 6: Local trajectory, (a): The local trajectory specified by a[x3X4Xi] S S 
at |a[x3X4Xi]| £ B, |ax3[x4XiX3]| (downward) and \a/x\ [X1X3X4] | (upward) 6 £>. 
(b): Smoothness condition at |a[x3X4Xi] | € £? (white) specified by a[x3X4Xi] € 
S. The next tetrahedron 1 0^3 [0:4 a; 1X3] | € -B (grey) could assume either X1X3X4 
or xiX2^4 as gradient. 

and their gradients are X1X2X4, X1X2X3, X2X3X4, and X1X3X4 respectively (figure 
Kb))- 

Then, a gradient value over a flat tetrahedron specifies a local trajectory at 
the flat tetrahedron as follows: 

Definition 5. The local trajectory defined by a \xi Xj x/c] £E S at |tt[xjXjX£;]| £ 
B is the three consecutive flat tetrahedrons {|etXi[xjXfcXi]|, |a[arj£jiEfc]|, 
\a/xk[xkXiXj]\} C B. As figure[H](a) shows, these are the adjacent tetrahedrons 
connected along the direction of the bold edge of |a[xiXj:Efc]|. And the local 
trajectory is specified uniquely by the gradient of a[xiXjXk]- 

Now we impose a kind of "smoothness condition" as shown in figure |6] (b) . 
That is, each flat tetrahedron assume one of two gradient values, which are 
determined naturally by the gradient of the preceding tetrahedron. Suppose 
that the gradient at current tetrahedron |a[x3X4Xi]| € B is X1X3X4 and the 
gradient at next tetrahedron 1 0x3 [£4X1 £3]! € B is either X2X3X4 or X1X2X3. 
Then, the bold edges of the two flat tetrahedrons |a[x3X4Xi]| and |ax3[x4XiX3]| 
are not connected smoothly as shown in figure [6] (b). In this case, we permit 
either X1X3X4 or X1X2X4 as gradient of |ox3[x4XiX3]|. 

As an example, let's consider a closed trajectory of peaks and valleys speci- 
fied by three peaks a = X1X2X4, b = X1X3X4, and a = X2X3X4 € Z[xi, X2, X3, X4] 
(figure [7]). Peaks and valleys define a "smooth" vector field V on B by the 
following mapping: 

V : B — > {xiX 2 X3,XiX2X 4 ,XiX3X4,X2X3X4}, V (\a[XiXjXk] \ ) :=x l x j x k , 

where a[xiXjXk] €E S is the slant tetrahedron on the surface of the peaks and 
valleys over |a[xiXjXfc]| € B. 

Let's start from tetrahedron |a[x3X4Xi]| (grey) and move downward: i[0] = 
|a[x3X4Xi]| and V(£[0]) = X1X3X4. Then, gradient V(f[0]) specifies local trajec- 
tory {|6[x2X4Xi]|, |a[x3X4Xi]|, |a[x3X4X2]|} at t[0]. Since we move downward, 
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Trajectory 




t[0] = |a[x 3 x 4 x,]| 
t[l] = |b[x 2 x 4Xl ]| 
t[2] = |b[x 2 x 4 x 3 ]| 
t[3] = |c[x 1 x 4 x 3 ]| 
t[4] = |c[x,x 4 x 2 ]| 



t[5] = |a[x 3 x 4 x 2 ]| 



Gradient 2" d Deriva. 



V(t[0]) 


— X[X 3 X 4 


DV(t[0]) = 


D 


V(t[l]) 


— XjX 2 X 4 


DV(t[l]) = 


U 


V(t[2]) 


= X 2 X 3 X 4 


DV(t[2]) = 


D 


V(t[3]) 


= x,x 3 x 4 


DV(t[3]) = 


U 


V(t[4]) 


— X]X 2 X 4 


DV(t[4]) = 


D 


V(t[5]) 


= X 2 X 3 X 4 


DV(t[5]) = 


U 



Figure 7: A closed trajectory of the vector field specified by three peaks a — 
X1X2XA, b — X1X3X4, and a — X2X3X4 £ Z[xi, X2, X3, Xi\. 



next tetrahedron t[l] is |6[a;2X4a;i] | and we obtain y(i[f]) = X1X2X4. Then, gra- 
dient specifies local trajectory {|&[x2X4a;3]| |6[x2£4£i]|, |a[a;3X4a;i] |} at 
t[l]. And next tetrahedron t[2] is |6[x2X4X3]|. Continuing the process, we obtain 
the closed trajectory of tetrahedrons. 

Note that the trajectory could be obtained by folding the tetrahedron se- 
quence mentioned above (figure Q] (b)). 



3.2 Encoding of the shape of trajectories 

Thanks for the smoothness condition again, variation of gradient, i.e., the "sec- 
ond derivative" , along a tetrahedron trajectory is also given as binary valued 
sequence. 

Definition 6. The derivative DV of vector field V along trajectory {t[i\} 
is defined as follows: 



DV : 5 -> {U,D}, DV{t[i}) 



DV(t[i-l}) XV(t[i]) = V{t[i-l]) 
—DV(t[i — 1]) otherwise, 



where — U := D and — D := U. In words, change value if the gradient changes. 

As an example, let's consider the trajectory of figure [7] again. First, set any 
initial value: Z?V(<[0]) = D. Then, since the first two tetrahedrons t[Q] and 
t[l] have different gradient values, Z?V(i[l]) is U. The third tetrahedron t[2] 
assumes yet another gradient and value of the second derivative changes to U. 
Continuing the process, we obtain a binary sequence of length six, DUDUDU, 
which describes the shape of the trajectory. 
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Figure 8: Encoding of local structure. Left: A polygonal chain which represents 
the structure of an amino acid fragment to be encoded. Middle: Approximation 
by a folded tetrahedron sequence. Right: Variation of gradient along the folded 
tetrahedron sequence. 

4 Encoding of local protein structure 

Now let's encode local protein structure using variation of gradient along a 
trajectory of tetrahedrons. 

To study the local structure of a protein, i.e., polygonal chain, we consider 
all the amino acid fragments of length five occurred in the protein. (It will turn 
out that length five is enough to detect local features.) And polygonal chains 
are approximated by folded tetrahedron sequences to detect their local features, 
where we permit translation and rotation during the folding process to absorb 
irregularity of the structure (figure |SJ . 

Each fragment is approximated by a folded tetrahedron sequence of length 
five, starting from the middle point amino acid, say A. And variation of gradient 
along the sequence is computed to encode its structural features. We call the 
resulting {D, f/}-valued sequence of length five the 5-tile code of A. 

4.1 Encoding algorithm 

In the following, we will explain the algorithm of "tetrahedron folding with 
translation and rotation." As an example, let's consider the polygonal chain 
AA[-2]-AA[-l]-AA[0]-AA[l]-AA[2] of figured] (a) and compute the 5-tile code 
of AA[0] using a sequence of five tetrahedrons T[-2]-T[-l]-T[0]-T[l]-T[2]. 

4.1.1 Step 1 

Align tetrahedron T[0] (white) with amino acid AA[0] and set initial values 
(figure [9] (b)). In this example, the gradient and second derivative of T[0] is 
X1X2X4 and D respectively. 

Then, the initial positions of adjacent tetrahedrons T[±l] (grey) are also 
determined, which are moved to the positions of AA[±1] respectively later. 

4.1.2 Step 2 

Assign gradient to adjacent tetrahedrons T[±l] considering the direction of 
AA[±2] respectively (Figure[9] (c)). For example, tetrahedron T[l] could assume 
X1X2X4 or X2X3X4 as its gradient. And the next tetrahedron (grey) becomes 
closer to j4A[2] if X2X3X4 is assumed. Thus, the gradient of T[l] is X2X3X4 and 
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(e) (f) (g) 



Figure 9: Algorithm of the 5-tile coding, (a): Polygonal chain A<4.[— 2]-AA[— 1]- 
■ • --AA[2] which represents the structure of an amino acid fragment to be en- 
coded, (b): Step 1. (c): Step 2. (d): Step 3. (e): Step 4. (f): Step 5. (g) Step 
6. The character strings show the corresponding sequences of gradients (left) 
and second derivatives (right), where top are the those of T{— 2] and bottom are 
those of T[2]. K, L, M, and N stand for 22X33:4, X1X2X4, and X1X3X4 

respectively. 

its second derivative is U since the gradients of T[Q] and T[l] are different. In the 
same way, T[— 1] is assigned X1X2X4 and D as its gradient and second derivative 
respectively. 

Note that the initial positions of adjacent tetrahedrons T[±2] (grey) are also 
determined, which are moved to the positions of AA[±2] respectively later. 

4.1.3 Step3 

Translate tetrahedrons T[±l] to the positions of AA[±1] respectively (Figure 
0(d)). Adjacent tetrahedrons T[±2] (grey) are also moved with T[±l] respec- 
tively. 

4.1.4 Step4 

Rotate tetrahedrons T[±l] at the positions of AA[±1] so that the bold edges be- 
come parallel to the direction from ^4^4[0] to AA[±2] respectively (Figure 02(e)). 
Adjacent tetrahedrons T[±2] (grey) are also moved with T[±l] respectively. 

4.1.5 Step5 

Assign gradient to adjacent tetrahedrons T[±2] considering the direction of 
^4^4[±2] respectively (figureES (f)). For example, tetrahedron T [2] could assume 
X1X3X4 or X2X3X4 as its gradient. And the next tetrahedron (not shown) be- 
comes closer to AA[2] if X1X3X4 is assumed. Thus, the gradient of T[2] is X1X3X4 
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and its second derivative is D since the gradients of T[l] and T[2] are different. 
In the same way, T[— 2] is assigned X1X2X3 and U as its gradient and second 
derivative respectively. 

4.1.6 Step6 

Translate tetrahedrons T[±2] to the positions of ^4^4[±2] respectively (figure [5] 
(g)). And we have obtained binary sequence UDDUD, the 5-tile code of A[Q], 
which describes the shape of the amino acid fragment shown in figure [9] (a) . 

4.2 One-letter representation of 5-tile codes 

To save space, we use numerals and alphabets to denote 5-tile code C1C2C3C4C5. 
First, compute the value Y of the code which is defined as follows: Y = 
2 4 * C[ + 2 3 * C 2 + 2 2 * C 3 + 2 * C' A + C' 5 , where C[ = 1 if d is equal to U 
and C[ = if not. Then, assign the number to the code if the value Y is less 
than 10. Otherwise, assign the (Y — 9)-th alphabet to the code. 

For example, DDDUU corresponds to binary number 00011 and Y = 3. 
Thus, 3 is assigned to the code. On the other hand, DUDUD corresponds to 
binary number 01010 and Y = 10. Thus, the first alphabet A is assigned to the 
code. 

4.3 Example: transferase 1RKL 

The local structure of transferase 1RKL shown in figure [T] (a) is encoded as 
follows: 

MISDEQLNSL AITFGIVMMT LIVIYHAVDS TMSPKN 
• • 000RQAAA AAAAHAAAAA AAAAAAB0R0 0000 ••, 

where the top row shows the amino acid sequence of 1RKL and the bottom 
shows the corresponding 5-tile codes. As you see, we could capture recurring 
structural features without any structural templates (figure [T] (c)). 

In previous works, common short structural motifs (structural templates) of 
proteins are often identified by clustering a set of representative protein frag- 
ments, using unsupervised machine learning. Thus, identification and assign- 
ment of such motifs has been the challenges for local structure analysis. And, 
as a result, their methods could not recognize new local structural features nor 
structural distortions. 

On the other hand, there is no need for identification and assignment of 
structural templates in our method since we don't use them at all. And the 5- 
tile codes could detect both new local features and structural distortions because 
they are computed directly from atomic coordinates. 



11 



References 

[1] C. Bystroff, and D. Baker, Prediction of local structure in proteins 
using a library of sequence-structure motifs, J. Mol. Biol., 281(1998), pp. 
565-77. 

[2] A. G. de Brevern, C. Etchebest, and S. Hazout, Bayesian Proba- 
bilistic Approach for Predicting Backbone Structures in Terms of Protein 
Blocks, Proteins, 41(2000), pp. 271-287. 

[3] M. Rooman, J. Rodriguez, and S. Wo dak, Automatic definition of 
recurrent local structure motifs in proteins, J. Mol. Biol., 213-2(1990), pp. 
328-336. 

[4] O. Sander, I. Sommer, and T. Lengauer, Local protein structure pre- 
diction using discriminative models, BMC Bioinformatics, 7(2006), pp. 14- 
26. 

[5] R. Unger, and J. L. Sussman, The importance of short structural motifs 
in protein structure analysis, J. Comput. Aided Mol. Des., 7(1993), pp. 457- 
472. 

[6] Y. H. A Ban, H. Edelsbrunner, and J. Rudolph, Interface surface 
for protein-protein complexes, Proc. 8-th Int'l Conf. Res. Comput. Mol. 
Bio., (2004), pp. 205-212. 

[7] F. Cazals, F. Chazal, and T. Lewiner, Molecular Shape Analysis 
Based upon the Morse-Smale Complex and the Connolly Function, Proc. 
19-th ACM Sympo. on Comput. Gcom., (2003), pp. 351-360. 

[8] W. R. Taylor, and A. Aszodi, Protein Geometry, Classification, Topol- 
ogy and Symmetry - A computational analysis of structure -, Institute of 
Physics Publishing Ltd., London. 2005. 

[9] P. Rogen, and B. Fain, Automatic classification of protein structure by 
using Gauss integrals, Proc. Natl. Acad. Sci., 100(2003), pp. 119-124. 

[10] S. Rackovsky, and H. A. SCHERAGA, Differential Geometry and Poly- 
mer Conformation. 1, Macromolecules, 11(1978), pp. 1168-1174. 

[11] E. D. Demaine, S. Langerman, and J. O'Rourke, Geometric Restric- 
tions on Producible Polygonal Protein Chains, Algorithmica, 44-2(2006), 
pp. 167-181. 

[12] W. R. Taylor, Protein knots and fold complexity: Some new twists, 
Compt. Biol. Chcm., 31(2007), pp. 151-162. 

[13] H. Edelsbrunner, and E. P. Mucke, Three-dimensional alpha shapes, 
ACM Trans. Graphics, 13(1994), pp. 657-660. 



12 



